On Robust Arm-Acquiring Bandit Problems

نویسندگان

  • Shiqing Yu
  • Xiang Yu
چکیده

In the classical multi-armed bandit problem, at each stage, the player has to choose one from N given projects (arms) to generate a reward depending on the arm played and its current state. The state process of each arm is modeled by a Markov chain and the transition probability is priorly known. The goal of the player is to maximize the expected total reward. One variant of the problem, the so-called arm-acquiring bandit, studies the case where at each stage new projects may arrive. Another recent extension of the classical bandit problem incorporates the uncertainty of the transition probabilities. This robust control problem considers an adversary, “nature”, who aims to minimize the player’s expected total reward by choosing a different transition probability measure each time after the player makes a decision. In this paper, we consider the robust arm-acquiring bandit problem, the combination of the two extensions above, and show that there exists an optimal state-by-state retirement policy. The extension to the robust arm-acquiring tax problem under some condition is also introduced.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Parallel Program for 3 - Arm

We describe a new parallel program for optimizing and analyzing 3-arm Bernoulli bandit problems. Previous researchers had considered this problem computationally intractable, and we know of no previous exact optimizations of 3-arm bandit problems. Despite this, our program is able to solve problems of size 100 or more. We describe the techniques used to achieve this, and indicate various extens...

متن کامل

A Parallel Program for 3-arm Bandits

We describe a new parallel program for optimizing and analyzing 3-arm Bernoulli bandit problems. Previous researchers had considered this problem computationally intractable, and we know of no previous exact optimizations of 3-arm bandit problems. Despite this, our program is able to solve problems of size 100 or more. We describe the techniques used to achieve this, and indicate various extens...

متن کامل

Thompson sampling with the online bootstrap

Thompson sampling provides a solution to bandit problems in which new observations are allocated to arms with the posterior probability that an arm is optimal. While sometimes easy to implement and asymptotically optimal, Thompson sampling can be computationally demanding in large scale bandit problems, and its performance is dependent on the model fit to the observed data. We introduce bootstr...

متن کامل

A stochastic bandit algorithm for scratch games

Stochastic multi-armed bandit algorithms are used to solve the exploration and exploitation dilemma in sequential optimization problems. The algorithms based on upper confidence bounds offer strong theoretical guarantees, they are easy to implement and efficient in practice. We considers a new bandit setting, called “scratch-games”, where arm budgets are limited and reward are drawn without rep...

متن کامل

On the Optimal Amount of Experimentation in Sequential Decision Problems

We provide a tight bound on the amount of experimentation under the optimal strategy in sequential decision problems. We show the applicability of the result by providing a bound on the cut-off in a one-arm bandit problem.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014